pacman::p_load(jsonlite, tidygraph, ggraph,
visNetwork, graphlayouts, ggforce,
skimr, tidytext, tidyverse, patchwork, ggiraph, ggrepel)Take Home_Ex03
1. Background
FishEye International, a non-profit focused on countering illegal, unreported, and unregulated (IUU) fishing, has been given access to an international finance corporation’s database on fishing related companies. In the past, FishEye has determined that companies with anomalous structures are far more likely to be involved in IUU (or other “fishy” business). FishEye has transformed the database into a knowledge graph. It includes information about companies, owners, workers, and financial status. FishEye is aiming to use this graph to identify anomalies that could indicate a company is involved in IUU.
With reference to Mini-Challenge 3 of VAST Challenge 2023 and by using appropriate static and interactive statistical graphics methods, we will be helping FishEye to better understand fishing business anomalies.
2. Data Source
The data is taken from the Mini-Challenge 3 of VAST Challenge 2023.
3. Data Preparation
3.1 Install and launching R packages
The code chunk below uses p_load() of pacman package to check if packages are installed in the computer. If they are, then they will be launched into R. The R packages installed are:
3.2 Loading the Data
fromJSON() of jsonlite package is used to import MC3.json into R environment.
mc3_data <- fromJSON("data/MC3.json")The output is called mc3_data. It is a large list R object.
3.3 Extracting edges
The code chunk below will be used to extract the links data.frame of mc3_data and save it as a tibble data.frame called mc3_edges.
mc3_edges <- as_tibble(mc3_data$links) %>%
distinct() %>%
mutate(source = as.character(source),
target = as.character(target),
type = as.character(type)) %>%
group_by(source, target, type) %>%
summarise(weights = n()) %>%
filter(source!=target) %>%
ungroup()3.4 Extracting nodes
The code chunk below will be used to extract the nodes data.frame of mc3_data and save it as a tibble data.frame called mc3_nodes.
mc3_nodes <- as_tibble(mc3_data$nodes) %>%
mutate(country = as.character(country),
id = as.character(id),
product_services = as.character(product_services),
revenue_omu = as.numeric(as.character(revenue_omu)),
type = as.character(type)) %>%
select(id, country, type, revenue_omu, product_services) #select() used to organise the sequence of col4. Data Exploration and Data Wrangling
4.1 Exploring the edges data frame
In the code chunk below, skim() of skimr package is used to display the summary statistics of mc3_edges tibble data frame.
skim(mc3_edges)| Name | mc3_edges |
| Number of rows | 24036 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| source | 0 | 1 | 6 | 700 | 0 | 12856 | 0 |
| target | 0 | 1 | 6 | 28 | 0 | 21265 | 0 |
| type | 0 | 1 | 16 | 16 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| weights | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | ▁▁▇▁▁ |
The report above reveals that there is no missing values in all fields.
In the code chunk below, datatable() of DT package is used to display mc3_edges tibble data frame as an interactive table on the html document.
DT::datatable(mc3_edges)The edge table provides us an understanding of the relationship between the source and targets. Here source is the Company and the relationship with the target is based on the type column. There are two kinds of relationship; beneficial owner and company contacts.
Plotting the variables in edge dataframe
Below is the code chunk using ggplot to plot the distribution of the following:
- Distribution of the type of relationship that exist between the source and target and their corresponding frequency.
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
# Plot distribution of type
hist_type <- ggplot(data = mc3_edges,
aes(x = type)) +
geom_bar() +
geom_text(stat = 'count', aes(label = ..count..), vjust = -0.1) +
labs(title = "Distribution of Relationship Types", x = "Type", y = "Count") +
theme_bw() +
theme(plot.title = element_text(face = "bold"))- Number of companies that a beneficial owner owns
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
#Filter the type == "Beneficial Owner"
mc3_edges_owner <- mc3_edges %>%
filter(type == "Beneficial Owner") %>%
group_by(target, type) %>%
summarise(no_of_companies = n()) %>%
ungroup()
# Create a ggplot histogram to plot the no of companies a beneficial owner owns
gg_hist_own <- ggplot(mc3_edges_owner, aes(x = no_of_companies)) +
geom_histogram(fill = "steelblue") +
labs(title = "No of companies beneficial owners own", x = "No of companies", y = "Count") +
theme_bw() +
theme(plot.title = element_text(face = "bold")) +
scale_x_continuous(breaks = seq(min(mc3_edges_owner$no_of_companies), max(mc3_edges_owner$no_of_companies), by = 1))
# Calculate frequency counts for each bin
freq_counts <- table(mc3_edges_owner$no_of_companies)
# Create a data frame for labels
label_data <- data.frame(x = as.numeric(names(freq_counts)), y = as.numeric(freq_counts))
# Add frequency labels to the plot
gg_hist_own <- gg_hist_own +
geom_text(
data = label_data,
aes(x = x, y = y, label = y),
vjust = -0.5,
size = 3
)Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
#Combining the two plots using patchwork
combined_plot <- hist_type / gg_hist_own
combined_plot
As seen from the above plot, there are a total of 16,792 count for beneficial owners and 7,244 for Company contacts.
Also, we can see that a majority of owners own 1 company. In fact, less than 0.5% of the beneficial owners own more than 3 companies. This may call for suspicion and we will further investigate later on. For instance, we could look at the size of the company in terms of their revenues and the number of owners it has.
Creating new edge dataframe
Below is the code chunk to create a new edge dataframe called mc3_edges_with_no_of_companies, which has the no_of_companies column added in.
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
# Join the no_of_companies column from mc3_edges_owner into mc3_edges
mc3_edges_with_no_of_companies <- mc3_edges %>%
left_join(mc3_edges_owner %>% select(target, no_of_companies),
by = c("target" = "target")) %>%
mutate(no_of_companies = ifelse(is.na(no_of_companies), 0, no_of_companies))
# View the updated mc3_edges
mc3_edges_with_no_of_companies# A tibble: 24,036 × 5
source target type weights no_of_companies
<chr> <chr> <chr> <int> <dbl>
1 1 AS Marine sanctuary Christina Taylor Compa… 1 1
2 1 AS Marine sanctuary Debbie Sanders Benef… 1 1
3 1 Ltd. Liability Co Cargo Angela Smith Benef… 1 1
4 1 S.A. de C.V. Catherine Cox Compa… 1 0
5 1 and Sagl Forwading Angela Mendoza Compa… 1 0
6 1 and Sagl Forwading Christopher Watson Benef… 1 1
7 2 Limited Liability Company Amanda Mcdonald Benef… 1 1
8 2 Limited Liability Company Megan Padilla Compa… 1 0
9 2 Limited Liability Company Monica Martinez Compa… 1 0
10 2 Limited Liability Company Teresa Collins Benef… 1 1
# ℹ 24,026 more rows
4.2 Exploring the nodes data frame
In the code chunk below, skim() of skimr package is used to display the summary statistics of mc3_nodes tibble data frame.
skim(mc3_nodes)| Name | mc3_nodes |
| Number of rows | 27622 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 6 | 64 | 0 | 22929 | 0 |
| country | 0 | 1 | 2 | 15 | 0 | 100 | 0 |
| type | 0 | 1 | 7 | 16 | 0 | 3 | 0 |
| product_services | 0 | 1 | 4 | 1737 | 0 | 3244 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| revenue_omu | 21515 | 0.22 | 1822155 | 18184433 | 3652.23 | 7676.36 | 16210.68 | 48327.66 | 310612303 | ▇▁▁▁▁ |
There are a large number of missing values in the revenue_omu column.
In the code chunk below, datatable() of DT package is used to display mc3_nodes tibble data frame as an interactive table on the html document.
DT::datatable(mc3_nodes)Observing the nodes datatable above, we will notice that some of the node ids are not unique, some may have more than 1 country, offer more than 1 product services and/or more than 1 revenue reflected. This could be one of the way to infer the size of the company; if it operates in more than 1 country and/or offer many products, its likely that they are a big company.
Handling of missing and/or unknown values
Notice that the product services column contains NA or character(0) values, which are meaningless, thus replace it with “unknown”. As for revenue_omu column that has NA values, replace it with the value “0”.
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
mc3_nodes <- mc3_nodes %>%
mutate(product_services = ifelse(product_services == "character(0)", "unknown", product_services),
revenue_omu = ifelse(revenue_omu == "" | is.na(revenue_omu), "0", revenue_omu))Checking for duplicate nodes and removing them
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
# Calculate the number of duplicates in mc3_nodes
num_duplicates_nodes <- sum(duplicated(mc3_nodes))
# Display the number of duplicates
#num_duplicates_nodes
# Remove duplicates from mc3_nodes
mc3_nodes_unique <- distinct(mc3_nodes)There are a total of 2595 duplicated nodes. These duplicated nodes are removed and a new nodes dataframe, mc_nodes_uniquedataframe is created.
4.2.1 Distribution of the type of nodes
Below is the code chunk to plot the distribution of the nodes type.
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
hist_type_node <- ggplot(data = mc3_nodes_unique,
aes(x = type)) +
geom_bar()+
geom_text(stat = 'count', aes(label = ..count..), vjust = -0.1) +
labs(title = "Distribution of Node Type", x = "Type", y = "Count") +
theme_bw() +
theme(plot.title = element_text(face = "bold"))
#hist_type_node4.2.2 Distribution of the product_services
In this section, we will perform text sensing using appropriate functions of tidytext package.
To begin, we will employ the tokenisation process. In text sensing, tokenisation is the process of breaking up a given text into units called tokens. Tokens can be individual words, phrases or even whole sentences. In the process of tokenisation, some characters like punctuation marks may be discarded. The tokens usually become the input for the processes like parsing and text mining.
In the code chunk below, unnest_token() of tidytext is used to split text in product_services field into words.
token_nodes <- mc3_nodes_unique %>%
unnest_tokens(word,
product_services)The two basic arguments to unnest_tokens() used here are column names. First we have the output column name that will be created as the text is unnested into it (word, in this case), and then the input column that the text comes from (product_services, in this case).
By default, punctuation has been stripped. (Use the to_lower = FALSE argument to turn off this behavior).
By default, unnest_tokens() converts the tokens to lowercase, which makes them easier to compare or combine with other datasets. (Use the to_lower = FALSE argument to turn off this behavior).
Now we can visualise the words extracted by using the code chunk below.
token_nodes %>%
count(word, sort = TRUE) %>%
top_n(5) %>%
mutate(word = reorder(word, n)) # A tibble: 5 × 2
word n
<fct> <int>
1 unknown 21009
2 and 6389
3 products 1860
4 of 881
5 as 752
The tibbel dataframe above reveals that the unique words contains some words that may not be useful to use. For instance “and” and “to”. In the word of text mining we call those words stop words. Tidytext package has a function called stop_words that can help us clean up stop words.
stopwords_removed <- token_nodes %>%
anti_join(stop_words)There are two processes:
Load the stop_words data included with tidytext. This data is simply a list of words that you may want to remove in a natural language analysis..
Then
anti_join()of dplyr package is used to remove all stop words from the analysis..
stopwords_removed %>%
filter(!word %in% c("unknown", "services", "related","including", "offers","range")) %>% #filter away meaningless words
count(word, sort = TRUE) %>%
top_n(20) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "Count",
y = "Unique words",
title = "Count of unique words found in product_services field")
The below code chunk will better help us categorise our product_services for analysis into fishing related, non-fishing related and unknown.
#Create a list of fishing related words
include_words <- c("fish", "fishing", "seafood", "seafoods","prawns","prawn", "salmon","tuna","shrimp","shrimps","crab","squid","oyster","clam","mollusks","crustaceans","roe","fillet","haddock","octopus","herring","lobsters","seabass","cephalopods","cod","shellfish","shark","chum")
#Use the grepl() function to create a logical vector indicating whether each word in mc3_nodes_unique$product_services is found in the include_words list. Store the result in a new column called category
mc3_nodes_unique$category <- ifelse(grepl(paste0("\\b", paste(include_words, collapse = "\\b|\\b"), "\\b"),
tolower(mc3_nodes_unique$product_services)),
"Fishing-related",
ifelse(mc3_nodes_unique$product_services == "Unknown",
"Unknown",
"Non-fishing related"))Next, we could look at the distribution of the category and find the median revenue for each category. We can observed from the pie chart below that only a small percentage (4%) of companies who offer fishing-related products_services. Also, the bar chart shows that the median revenue for fishing industry is around 29,811.38 OMU.
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
library(dplyr)
library(ggplot2)
library(ggrepel)
# Define the colors for each category
category_colors <- c("Fishing-related" = "#B4D4E7", "Non-fishing related" = "#B4E7BD", "Unknown" = "#D3D3D3")
# Set the category as a factor with desired order
category_freq <- mc3_nodes_unique %>%
mutate(category = factor(category, levels = c("Fishing-related", "Non-fishing related", "Unknown"))) %>%
count(category) %>%
mutate(percentage = prop.table(n) * 100)
# Create a pie chart with labels
ggplot_cat <- ggplot(category_freq, aes(x = "", y = n, fill = category)) +
geom_bar(width = 1, stat = "identity", color = "black") +
coord_polar(theta = "y") +
xlab("") +
ylab("") +
labs(title = "Distribution of Category") +
theme_void() +
theme(legend.position = "right",
plot.title = element_text(hjust = 0.5, face = "bold")) +
geom_label_repel(aes(label = paste0(category, "\nCount: ", n, "\n", round(percentage, 1), "%")),
box.padding = 0.5,
point.padding = 0.1,
segment.color = "black",
show.legend = FALSE,
label.color = "black") +
scale_fill_manual(values = category_colors)Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
#Convert revenue_omu to numeric
mc3_nodes_unique <- mc3_nodes_unique %>%
mutate(revenue_omu = as.numeric(revenue_omu))
# Define the colors for each category
category_colors <- c("Fishing-related" = "#B4D4E7", "Non-fishing related" = "#B4E7BD", "Unknown" = "#D3D3D3")
# Calculate the median revenue_omu for each category
median_revenue <- mc3_nodes_unique %>%
group_by(category) %>%
filter(category != "Non-fishing related" | (category == "Non-fishing related" & revenue_omu != 0 & !is.na(revenue_omu))) %>%
summarize(median_revenue_omu = median(revenue_omu, na.rm = TRUE))
# Plot the bar chart
ggplot_rev <- ggplot(median_revenue, aes(x = category, y = median_revenue_omu, fill = category)) +
geom_col() +
scale_fill_manual(values = category_colors) +
xlab("Category") +
ylab("Median Revenue (OMU)") +
labs(title = "Median Revenue by Category") +
theme_bw() +
theme(plot.title = element_text(face = "bold")) +
geom_text(aes(label = round(median_revenue_omu, 2)), vjust = -0.5)Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
combined_plot2 <- ggplot_cat / ggplot_rev
combined_plot2
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
## Adding the `mc3_nodes_unique` attributes, consider both beneficial owners and company contacts
filtered_mc3_edges <- mc3_edges_with_no_of_companies %>%
filter(no_of_companies > 3)
# Create a data frame with source nodes and rename column
id4 <- filtered_mc3_edges %>%
select(source) %>%
rename(id = source) %>%
mutate(type_node = "company")
# Create a data frame with target nodes and rename column
id5 <- filtered_mc3_edges %>%
select(target, type) %>%
rename(id = target, type_node = type)
# Combine the two data frames and remove duplicates
mc3_nodes3 <- rbind(id4, id5) %>%
distinct() %>%
left_join(mc3_nodes_unique,
unmatched = "drop") %>%
distinct()
mc3_nodes3 <- mc3_nodes3 %>%
mutate(revenue_omu = ifelse(revenue_omu == "" | is.na(revenue_omu), "0", revenue_omu))
# Convert the revenue column to numeric (if it's not already numeric)
mc3_nodes3$revenue_omu <- as.numeric(mc3_nodes3$revenue_omu)
# Calculate the revenue threshold for the top 20% excluding non-numeric or missing values
revenue_threshold <- quantile(mc3_nodes3$revenue_omu, probs = 0.90, na.rm = TRUE)
# Filter the DataFrame to retain only the rows with revenue above the threshold
filtered_mc3_nodes <- mc3_nodes3[mc3_nodes3$revenue_omu > revenue_threshold, ]Show the code
#| echo: false
#| fig-width: 5
#| fig-height: 6
# Create a bar chart of revenue vs ID using ggplot
bar_plot_toprev <- ggplot(filtered_mc3_nodes, aes(x = reorder(id, revenue_omu), y = revenue_omu/1000)) +
geom_bar_interactive(aes(tooltip = paste("ID:", id,
"<br>Type:", type_node,
"<br>Country:", country,
"<br>Revenue:", revenue_omu,
"<br>Product Services:", product_services)),
stat = "identity", fill = "steelblue") +
labs(x = "id", y = "Revenue_omu ('000)", title = "Top 10% ids") +
coord_flip() +
theme(plot.title = element_text(face = "bold"))+
theme(axis.text.y = element_text(size = 6))
# Print the bar plot
girafe(ggobj = bar_plot_toprev,
width_svg = 8,
height_svg = 8*0.618)5. Network Visualisation and Analysis
5.1 Building network model with tidygraph for Beneficial Owners
Based on our edge dataframe analysis earlier on, we found out that less than 0.5% of the beneficial owners own more than 3 companies, which calls for suspicion, thus we will further investigate, by plotting the network graph and seeing their relationship with other owners and/or companies.
Preparing edge data table
#filter those beneficial owners that has more than 3 companies
filtered_mc3_edges_owner <- mc3_edges_with_no_of_companies %>%
filter(no_of_companies > 3, type == "Beneficial Owner")Preparing nodes data table
Instead of using the nodes data table extracted from mc3_data, we will prepare a new nodes data table by using the source and target fields of filtered_mc3_edges_owner data table. This is necessary to ensure that the nodes in nodes data tables include all the source and target values.
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3
# Create a data frame with source nodes and rename column
id1 <- filtered_mc3_edges_owner %>%
select(source) %>%
rename(id = source) %>%
mutate(type_node = "company")
# Create a data frame with target nodes and rename column
id2 <- filtered_mc3_edges_owner %>%
select(target, type) %>%
rename(id = target, type_node = type)
# Combine the two data frames and remove duplicates
mc3_nodes1 <- rbind(id1, id2) %>%
distinct() Tidygraph model
mc3_graph <- tbl_graph(nodes = mc3_nodes1,
edges = filtered_mc3_edges_owner,
directed = FALSE) %>%
mutate(betweenness_centrality = centrality_betweenness(),
closeness_centrality = centrality_closeness())Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4
# Preparing edges tibble data frame
edges_df <- mc3_graph %>%
activate(edges) %>%
as.tibble()
# Preparing nodes tibble data frame
nodes_df <- mc3_graph %>%
activate(nodes) %>%
as.tibble() %>%
rename(label = id) %>%
mutate(id=row_number()) %>%
select(everything()) %>%
relocate(id, .before = label)
nodes_df <- nodes_df %>%
rename(group = type_node)
# Plot the network graph with labeled nodes using visNetwork
visNetwork(nodes_df, edges_df, main = list(text = "Network Graph of Company and Beneficial Owner",
style = "color: black; font-weight: bold; text-align: center;")) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visLayout(randomSeed = 123) %>%
addFontAwesome(name ="font-awesome") %>%
visGroups(groupname = "company", shape = "icon",
icon = list(code = "f0f7", color = "#000000")) %>%
visGroups(groupname = "Beneficial Owner", shape = "icon",
icon = list(code = "f2bd")) %>%
visLegend() %>%
visOptions(
highlightNearest = TRUE,
nodesIdSelection = TRUE,
) %>%
visInteraction(
zoomView = TRUE,
dragNodes = TRUE,
dragView = TRUE,
navigationButtons = TRUE,
selectable = TRUE, # Enable node selection
hover = TRUE, # Enable hover effects
)Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4
# Set a seed for reproducibility
set.seed(123)
ggraph_own <- mc3_graph %>%
ggraph(layout = "fr") +
geom_edge_link(aes(alpha=0.5)) +
geom_node_point(aes(
size = betweenness_centrality,
colors = "lightblue",
alpha = 0.5)) +
scale_size_continuous(range=c(1,10))+
theme_graph()
ggraph_own
Top 10 Owners
filtered_mc3_edges_owner# A tibble: 313 × 5
source target type weights no_of_companies
<chr> <chr> <chr> <int> <dbl>
1 Acevedo, Dickson and Gonzalez Richard Smith Bene… 1 6
2 Adams Group John Smith Bene… 1 9
3 Adams-Pope Michelle Rodr… Bene… 1 4
4 Adriatic Catch S.A. de C.V. David Jones Bene… 1 6
5 Albertine Rift NV Family Michael Taylor Bene… 1 4
6 Alexander PLC David Jones Bene… 1 6
7 Alvarez Ltd Michael Carter Bene… 1 5
8 Alvarez, Young and Ramos Michael Miller Bene… 1 5
9 Ancla del Este Ltd. Liability Co Aaron Jones Bene… 1 4
10 Ancla del Este Sp Fish John Jones Bene… 1 4
# ℹ 303 more rows
top_10_ids <- filtered_mc3_edges_owner %>%
select(target, no_of_companies) %>%
distinct() %>%
arrange(desc(no_of_companies))
DT::datatable(top_10_ids)John Smith and Michael Johnson for instance are the beneficial owners for 8 companies.
5.2 Building network model with tidygraph for Company Contacts
Similarly, to plot the network graph of Company and Company Contacts, we do the same as above,
Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4
#Filter the type = "Company Contacts" to create the edge data table
mc3_edges_cc<- mc3_edges_with_no_of_companies %>%
filter(no_of_companies > 3, type == "Company Contacts")
# Create the nodes data table
# Create a data frame with source nodes and rename column
id3 <- mc3_edges_cc %>%
select(source) %>%
rename(id = source) %>%
mutate(type_node = "company")
# Create a data frame with target nodes and rename column
id4 <- mc3_edges_cc %>%
select(target, type) %>%
rename(id = target, type_node = type)
# Combine the two data frames and remove duplicates
mc3_nodes2 <- rbind(id3, id4) %>%
distinct()
#Building the tidygraph model for company contacts
mc3_graph2 <- tbl_graph(nodes = mc3_nodes2,
edges = mc3_edges_cc,
directed = FALSE) %>%
mutate(betweenness_centrality = centrality_betweenness(),
closeness_centrality = centrality_closeness())Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4
# Preparing edges tibble data frame
edges_df_2 <- mc3_graph2 %>%
activate(edges) %>%
as.tibble()
# Preparing nodes tibble data frame
nodes_df_2 <- mc3_graph2 %>%
activate(nodes) %>%
as.tibble() %>%
rename(label = id) %>%
mutate(id=row_number()) %>%
select(everything()) %>%
relocate(id, .before = label)
nodes_df_2 <- nodes_df_2 %>%
rename(group = type_node)
# Plot the network graph with labeled nodes using visNetwork
visNetwork(nodes_df_2, edges_df_2, main = list(text = "Network Graph of Company and Company Contacts",
style = "color: black; font-weight: bold; text-align: center;")) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visLayout(randomSeed = 123) %>%
addFontAwesome(name ="font-awesome") %>%
visGroups(groupname = "company", shape = "icon",
icon = list(code = "f0f7", color = "#000000")) %>%
visGroups(groupname = "Company Contacts", shape = "icon",
icon = list(code = "f0c0")) %>%
visOptions(
highlightNearest = TRUE,
nodesIdSelection = TRUE,
) %>%
visLegend() %>%
visInteraction(
zoomView = TRUE,
dragNodes = TRUE,
dragView = TRUE,
navigationButtons = TRUE,
selectable = TRUE, # Enable node selection
hover = TRUE, # Enable hover effects
)Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4
# Set a seed for reproducibility
set.seed(123)
mc3_graph2 %>%
ggraph(layout = "fr") +
geom_edge_link(aes(alpha=0.5)) +
geom_node_point(aes(
size = betweenness_centrality,
colors = "lightblue",
alpha = 0.5)) +
scale_size_continuous(range=c(1,10))+
theme_graph()